Copyright © 1975 American Telephone and Telegraph Company
The Bell System Technical Journal
Safeguard Supplement
Printed in V.S.A.
SAFEGUARD Data-Processing System:
SAFEGUARD Data Reduction System
By E. S. HOOVER and R. A. JACOBY
(Manuscript received January 3, 1975)
In evaluating and certifying the SAFEGUARD ABM system, it ivas neces-
sary to interrogate and analyze the massive volume of data generated during
tests. The number of different reports, listings, and plots was so large and
the variety so great that a flexible data reduction system had to be placed
at the disposal of the user community. At the same time, the system had
to be highly efficient and quickly available to be used at all. The SAFE-
GUARD Data Reduction System was designed to accomplish these objectives.
I. INTRODUCTION
Si nee any operational system must be tested, certified, and evalu-
ated, provision of the means for doing so must be part of its design.
The first step in certifying that a process is performing as specified
is the recording of certain significant data during test runs. These data
must be reduced and presented in a variety of ways. The Safeguard
Data Reduction System (sdrs) fills this role by providing a flexible
and highly efficient facility to serve the needs of the test teams.
The fundamental capabilities of sdrs had to be available when the
testing began. The design for the real-time recording programs and
the data reduction programs had to be coordinated, since the recording
program serves as input to the reduction program. Short reduction
program development schedules made necessary the use of certain
preexisting designs and code, which had to be worked into the result-
ing system without compromising the other requirements. To accom-
plish this, a number of deliveries were planned, starting with the
simplest and most basic features. Users who tried the first system were
able to give sdrs designers useful feedback for future deliveries.
This paper discusses the experience gained in formulating require-
ments, organizing the program, developing the facility, and interacting
with users.
S181
II. REQUIREMENTS FOR THE SAFEGUARD DATA REDUCTION SYSTEM
To test, certify, and evaluate the behavior of a Safeguard pro-
cess, it is necessary to record data from memory during the real-
time execution of the process. Debugging and integrating can con-
ceivably be accomplished by stopping the process and taking post-
mortem memory dumps, but this destroys the real-time sequence of
events. Since the Safeguard processes contain hundreds of thousands
of lines of code, testing would be exceedingly cumbersome if it were
necessary to stop the test to record data. Because calls to the recording
subroutine are planned for and remain always in the code, a wealth
of internal data can be recorded without disturbing the real-time be-
havior of the process. Recording occupies some cpu time but, aside
from this effect, the process performs in the same way whether or not
recording occurs. Real-time recording is essential for a practical testing
program.
2.1 Requirements for recording
Thousands of events for which data might be taken occur during a
test. The specific data needed depend on the purpose of the test. The
clc operating system 1 allows the applications process to record up to
100,000 32-bit words of data per second, as many as eight reels of tape
during a 16-minute test.
Unless care is taken in recording design, even this large capacity
can be exceeded. Designers tend to do more recording than is needed.
This happens because the recording decision must be made long before
testing begins. By recording almost everything, designers protect
against overlooking some data items that may be wanted. As a result,
a burden is placed on data reduction to select from a large mass of data
only those items the user needs.
The data as recorded by the clc operating system are organized
into physical records of variable length. Each physical record contains
a header and one or more logical records. The header preceding each
logical record categorizes the data to follow. Records of one type
would contain the most common and essential items, while records of
another type might contain more voluminous data.
2.2 SDRS requirements
Consideration of the content, structure, and volume of the input
and the expected use for the output dictate requirements for the data
reduction system. Very large quantities of data are recorded from
over 1000 different data sets.
Many data structures are implemented in the processes, but four
typical ones were selected for processing by sdrs. Many more com-
plicated structures, if properly recorded, can be handled by sdrs.
S182 THE BELL SYSTEM TECHNICAL JOURNAL, SAFEGUARD
To be correctly interpreted by sdrs, the physical attributes (floating
point, integer, etc.) of each item of recorded data must be defined.
Since a majority of these items must also be defined for centran
compilations, sdrs avoids possible incompatibilities by accessing the
centran declarations.
To reduce data items not defined in centran and to allow quick
response to patches in the clc applications processes, sdrs also pro-
vides utilities to define and modify attributes. Experience with earlier
data reduction systems, which provided only manual methods of data
attribute definition, led to the requirement for both automated and
manual methods.
There is a need to format the data so that they may be interpreted
with ease. Some factors to be considered are discussed here.
Ease of interpretation of the recorded data requires that methods
be supplied for selection of only the necessary subset of the data for
presentation. This allows the user to generate exception reports and
summaries rather than printing or plotting every data value.
Raw data may be recorded in one form, but they may be much more
useful in another. Presentation in engineering units is often helpful.
User-defined computations can be made by sdrs to facilitate evalua-
tion of data.
In some cases, related data are scattered over a series of records
and even over a series of tapes. The difficult task of correlation is per-
formed by a file handler that can associate data for a user.
Some forms of presenting data are more useful than others. Since
it is impossible to predict what particular listing or plotting form will
best serve a given user, the users need the ability to format their own
reports for presentation. Specifically, the users choose from among
four basic ways to present data : formatted reports, tabular listings,
line and point plots, and histograms. Users specify titles, subtitles,
column headings, plotting axes, and scaling parameters for plots.
To gain user acceptance, sdrs (which is not a real-time facility)
had to provide features that could satisfy quick turnaround time re-
quirements with minimum effort.
Since most users developed their requirements as they began testing,
sdrs had to provide users with plots and tabular listings which they
could then easily modify to suit their later needs, sdrs provides users
with a high-level command language in which to specify more com-
plicated reduction requirements. Sufficient default conditions are
supplied so that a simple set of user commands will result in a listing
of all data items in a logical record.
One group of five users designed 737 tabular listings and reports in
12 months, averaging 12 per man- week. The elapsed time to get a
simple tabular listing to work was about two days.
DATA REDUCTION SYSTEM S183
Because thousands of corrections had to be made to applications
processes, there was a premium on quick turnaround time in processing
the recording tapes. Since many testing teams submitted reduction
runs simultaneously, it was essential that sdrs process a large volume
of records efficiently.
It would be possible to reduce data on the clc (this will be done
during the post-installation and test phase). However, the clc instal-
lations are limited in number, and time on the clc is normally devoted
to testing. Therefore, a decision was made early that data reduction
should be done off-line on commercially available computers. Testing
is performed for Safeguard in several locations. In each of these
locations, an off-line computer of the IBM 360/370 series is used for
data reduction. The decision to use an off-line facility was correct.
Time on the test machine is limited and is in much demand for the
primary testing task.
A facility such as sdrs cannot be developed with all capabilities
operational at the time an initial capability is needed by users. The
designer can capitalize on this. If he designs a modular system and an
open-ended user command language, he can first deliver a simple sys-
tem. Feedback after the users have tried the first system can improve
the design of later extensions. In fact, while users were presented with
a proposal for sdrs and were invited to give their comments, most
suggestions were obtained only after the first version of the system
was working.
2.3 What was learned in setting requirements
The needs of users with a great many reduction requests to main-
tain were not foreseen. Since sdrs made it easy to request a large
number of different reductions, the administration of requests re-
quired automation. One user group, supporting a single process inte-
gration, generated over 7000 sdrs statements. User groups usually
solved this problem by developing their own administrative programs.
It was important to restrict users to one specific command language
to give them the ability to turn out data reduction requests quickly.
Some users, familiar with Fortran or pl/i, undoubtedly would have
liked full control over program execution and the use of data types
available in those languages. Restricting the number of data types
supported increased the speed with which the system was developed.
III. SYSTEM ORGANIZATION
3.1 Design considerations
The development of a system organization and design philosophy
for sdrs was influenced by a variety of factors. These included user
S184 THE BELL SYSTEM TECHNICAL JOURNAL, SAFEGUARD
requirements, schedule constraints, experience of the designers, and
successes and failures of earlier systems.
The designs of several previous data retrieval and analysis systems
were analyzed to determine their applicability to the system require-
ments. This analysis uncovered serious shortcomings in most of these
systems for the high data volume requirements of Safeguard; how-
ever, several common strong points were noted in each.
A general solution to the data correlation problem was attempted
with the Mission Data Reduction (mdr) system, which was developed
for use in the Meek test system. The significant features of this system
were a command language user interface, general data sorting capa-
bilities, general data conversion capabilities, and data presentation
capabilities in the form of reports, plots, and tabular listings. The
major shortcomings of mdr were its complete dependence upon sorted
data to produce any output as well as a requirement to convert all
data before selecting the subset of interest. Both these characteristics
introduced exorbitant overhead for sequential processing.
A more specific approach to the problem was used in the. systems
that were successors to mdr. These systems added a data attribute
dictionary, an efficient sequential-file data extractor, and specific
data correlation capabilities in the form of special-purpose subroutines
to the basic capabilities of mdr. The major shortcomings of these
systems were limited selectivity during the extraction phase, a re-
quirement to convert a large percentage of the total data before select-
ing the subset of interest, and limited file generation capabilities that
required many passes over the raw data to extract all data of interest.
An additional shortcoming was a dependence upon manual main-
tenance methods for the data attribute dictionaries.
A common factor in each of these systems was uncovered : The
designers had little or no control over the format of the data files
that they were required to process. In general, the files were written
without regard to the eventual processing and correlation requirements.
With few exceptions, these characteristics of the data to be processed
introduced a great deal of complexity and overhead into the systems.
Once it had been determined that none of the available systems
would meet the requirements adequately, a design was proposed that
would provide an initial capability with incremental growth potential.
The user community required deliveiy of the first release within nine
months and incremental releases at two-month intervals.
To meet these schedules, it was necessary to achieve a balance be-
tween the development of new programs and the adaptation of existing
programs. The advantages of short development time offered by use
of existing programs had to be weighed against their extendibility,
DATA REDUCTION SYSTEM S185
flexibility, and maintainability. In addition, the efficiency requirements
for the data presentation capabilities had to be considered.
The requirements obtained from the user community were analyzed
to determine possible commonality. An attempt was made to deter-
mine the general characteristics of the data that would be generated
by the users. An estimate of the probable volume of data to be gen-
erated by these users was also made.
All users requested the same basic capabilities. These included
printing reports in a variety of forms, correlating and sorting data on
a variety of criteria, plotting data, and specifying the conditions under
which this processing was to be done. The large number of special
processing requests made it obvious that the development of a general-
purpose facility was necessary.
3.2 Basic functional components
The system consists of four basic functional components as in-
dicated in Fig. 1 :
(i) The data attribute definition component defines the charac-
teristics of data items by examining their centran declarations.
(it) The sequential data base retrieval component provides data
collection, selection, and presentation capabilities for sequen-
tially organized data files.
(in) The hierarchical data base generation component allows the
relatively efficient creation of direct access data files.
(iv) The hierarchical data base retrieval component provides data
collection and selection capabilities as well as sequential data
base generation capabilities for direct access data files.
3.3 Lessons learned
The overall efficiency of sdrs is difficult to measure since the users
of the system specify what the system must process. This introduces
into the evaluation of sdrs performance such factors as user expertise,
user knowledge of data characteristics, and user analysis of needs.
Several design decisions were made to minimize the impact of these
factors on system performance.
Provision of methods for the user to perform many data presenta-
tion operations on a single data retrieval pass was the primary charac-
teristic of the design that provided efficient processing capabilities.
This approach, although somewhat obvious for processing sequential
data bases, is equally applicable to the processing of direct-access
data bases. This is true because minimization of the number of times
data are retrieved will minimize elapsed time and system overhead.
S186 THE BELL SYSTEM TECHNICAL JOURNAL, SAFEGUARD
SEQUENTIAL
DATA-BASE
RETRIEVAL
COMPONENT
i DATA BASE i
USER
LANGUAGE
^
SEQUENTIAL \
„
DATA
ATTRIBUTE
DEFINITION
COMPONENT
HIERARCHICAL
DATA-BASE
GENERATION
COMPONENT
USER
REQUESTS
/hierarchical^
i data base j
\
HIERARCHICAL
DATA-BASE
RETRIEVAL
COMPONENT
/^
s-
LANGUAGE
/
\
Fig. 1 — Basic functional components.
Structured data recording makes it feasible to develop a simple
efficient data-filtering algorithm. This algorithm enables sdrs to dis-
card all records not requested by a user by interrogating fields in the
header and ignoring all data in the record.
Limiting the number of data structures supported makes it feasible
to design algorithms that extract and convert a minimum amount of
data. Minimizing the number of data conversions was especially critical
in the sequential data base retrieval module because of the large volume
of data. In this module, data conversion is delayed until after all user
conditions had been satisfied.
Assembly language was used in coding those critical paths of the
system that would process large volumes of data. The extra time re-
quired to develop these programs was offset by the increased efficiency
derived from this approach.
IV. DEVELOPMENT CONSIDERATIONS
The development of sdrs and the delivery of the system to the user
community with capabilities consistent with the requirements were
DATA REDUCTION SYSTEM S187
accomplished on short schedules. Several development procedures and
techniques were used by the design group that may be applicable to
other development efforts faced with similar problems.
The critical need of many users for a simple data-printing capability
was the basis for the design of the first release of SDKS. This release
consisted of the basic versions of the data attribute definition com-
ponent and the sequential data base retrieval component.
Although simple from a user capability point of view, this initial
release of the system was designed to be extendible. Emphasis was
placed on development of a design that would allow inclusion of addi-
tional processing capabilities without major perturbations.
The development of an outline for further functional capabilities
was begun in parallel with the development of the initial system. This
outline served as a vehicle for planning capability development and
delivery.
Formal design specifications for each system component were not
written. However, detailed interface specifications were developed.
This made it possible for individual routines to be designed in parallel.
The development of sdrs on schedule would not have been possible
without the use of a time-sharing system. Although time sharing is
relatively expensive, development times can be minimized when rapid
correction of troubles and extensive testing are required.
The size of the system required that extensive testing be done to
verify system performance. Although testing is possible in a batch
environment, the effectiveness of the system test team was greatly
enhanced by the availability of immediate test results and on-line de-
bugging capabilities.
V. USER INTERACTION
Before sdrs was designed, users were asked to submit their require-
ments. The sdrs design group then proposed to users an initial set
of requirements. Only after the initial release was meaningful user
feedback received. Whenever possible, suggested improvements were
incorporated into subsequent versions.
A system as complicated as sdrs requires user education. Two
methods were used for this purpose : A user's manual was written and
counselors were provided. The role of the counselors was to teach cor-
rect and efficient sdrs use and to collect feedback for improvements.
The final service to users is proper test and maintenance of the
system. Users were not asked to be guinea pigs. They were allowed
to try a new sdrs only after a complete set of tests were run. During
two years of use, only 81 troubles were encountered in 105,000 lines
S188 THE BELL SYSTEM TECHNICAL JOURNAL, SAFEGUARD
of source. Because of the error messages and the modularity of the
system, it was easy to identify and fix problems.
Total effort expended in user services has been IS percent of the
manpower of the SDRS group. This is considered a minimum effective
support level.
VI. CONCLUSION
The primary lesson learned from the development of sons is that
user data base design is critical. Recording and reduction efficiency
is achieved by designing data bases to minimize the requirement for
further correlation and restructuring.
The real achievement of sdrs lies in simultaneously accomplishing
the objectives of flexibility and efficiency. Many systems attain one
goal or the other: sdrs attempted to do both. Two design decisions
contributed to the success of this effort. First, recorded data not
wanted by the user are ignored by the system. Second, once data are
retrieved, they are processed in as many ways as needed.
REFERENCE
1. J. P. Haggerty, "Safeguard Data-Processing System: Central Logic and Control
Operating System," B.S.T.J., this issue, pp. S89-S99.
DATA REDUCTION SYSTEM S189